Researchers at Kaunas University of Technology (KTU) have developed an artificial intelligence model that helps identify depression based on speech and brain neural activity. This offers a new approach to depression diagnosis, according to a press release by KTU.
“Depression is one of the most common mental disorders, with devastating consequences for both the individual and society, so we are developing a new, more objective diagnostic method that could become accessible to everyone in the future,” says Rytis Maskeliūnas, a professor at KTU and one of the authors of the invention.
The scientists argue that while most diagnostic research for depression has traditionally relied on a single type of data, the new multimodal approach can provide better information about a person’s emotional state.
Impressive accuracy
The combination of speech and brain activity data achieved an impressive 97.53 percent accuracy in diagnosing depression, significantly outperforming alternative methods. “This is because the voice adds data to the study that we cannot yet extract from the brain,” explains Maskeliūnas.
According to Musyyab Yousufi, a PhD student at KTU who contributed to the invention, the choice of data was carefully considered: “While it is believed that, facial expressions might reveal more about a person’s psychological state, but this is quite easily falsifiable data. We chose voice because it can subtly reveal an emotional state, with the diagnosis affecting the pace of speech, intonation, and overall energy.”

Moreover, unlike electrical brain activity (EEG) or voice data, face can directly identify a person’s state of severity up to a certain extent. “But we cannot violate patients’ privacy, and also, collecting and combining data from several sources is more promising for further use,” says the professor at KTU Faculty of Informatics (IF).
Maskeliūnas emphasises that the EEG dataset used for the research was obtained from the Multimodal Open Dataset for Mental Disorder Analysis (MODMA), as the KTU research group represents computer science and not the medical science field.
MODMA EEG data was collected and recorded for five minutes while participants were awake, at rest, and with their eyes closed. In the audio part of the experiment, the patients participated in a question-and-answer session and several activities focused on reading and describing pictures to capture their natural language and cognitive state.
AI diagnosis
The collected EEG and audio signals were transformed into spectrograms, allowing the data to be visualised. Special noise filters and pre-processing methods were applied to make the data noise-free and comparable, and a modified DenseNet-121 deep-learning model was used to identify signs of depression in the images. Each image reflected signal changes over time. The EEG showed waveforms of brain activity, and the sound showed frequency and intensity distributions.

The model included a custom classification layer trained to split the data into classes of healthy or depressed people.
In the future, this AI model could speed up the diagnosis of depression, or even make it remote, and reduce the risk of subjective errors. This requires further clinical trials and improvements to the programme. However, Maskeliūnas adds, the latter aspect of the research might raise some challenges.
“The main problem with these studies is lack of data because people tend to remain private about their mental health issues,” he says.
Another important aspect is getting the algorithm to provide information to a medical professional about what led to the diagnosis. “The algorithm still has to learn how to explain the diagnosis in a comprehensible way,” says Maskeliūnas.
According to the KTU professor, due to the growing demand for AI solutions that directly affect people in areas of healthcare, finance, and the legal system, similar requirements are becoming common.
This is why explainable artificial intelligence (XAI), which aims to explain to the user why the model makes certain decisions and to increase their trust in the AI, is now gaining momentum.
The article 'Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121' was published in Brain Sciences Journal and can be accessed here.





